AITopics | detection and recognition

Collaborating Authors

detection and recognition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fare: Failure Resilience in Learned Visual Navigation Control

Wang, Zishuo, Loo, Joel, Hsu, David

arXiv.org Artificial IntelligenceOct-29-2025

Abstract-- While imitation learning (IL) enables effective visual navigation, IL policies are prone to unpredictable failures in out-of-distribution (OOD) scenarios. We advance the notion of failure-resilient policies, which not only detect failures but also recover from them automatically. F ailure recognition that identifies the factors causing failure is key to informing recovery: e.g. We present F are, a framework to construct failure-resilient IL policies, embedding OOD-detection and recognition in them without using explicit failure data, and pairing them with recovery heuristics. Real-world experiments show that F are enables failure recovery across two different policy architectures, enabling robust long-range navigation in complex environments. Visual navigation is an attractive approach to robot navigation, leveraging rich visual information from low-cost sensors [1]. Imitation learning (IL) has emerged as a key method to learn visual navigation policies [2]-[4], but is inherently limited by training data. IL policies may fail unpredictably on inputs outside the training distribution, often without clear explanation [5]-[7]. This work develops a mechanism to enable IL policies to detect and recover from failures, supporting robust open-world navigation.

artificial intelligence, machine learning, recognition, (18 more...)

arXiv.org Artificial Intelligence

2510.2468

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Sign Language: Towards Sign Understanding for Robot Autonomy

Agrawal, Ayush, Loo, Joel, Zimmerman, Nicky, Hsu, David

arXiv.org Artificial IntelligenceSep-17-2025

Navigational signs are common aids for human wayfinding and scene understanding, but are underutilized by robots. We argue that they benefit robot navigation and scene understanding, by directly encoding privileged information on actions, spatial regions, and relations. Interpreting signs in open-world settings remains a challenge owing to the complexity of scenes and signs, but recent advances in vision-language models (VLMs) make this feasible. To advance progress in this area, we introduce the task of navigational sign understanding which parses locations and associated directions from signs. We offer a benchmark for this task, proposing appropriate evaluation metrics and curating a test set capturing signs with varying complexity and design across diverse public spaces, from hospitals to shopping malls to transport hubs. We also provide a baseline approach using VLMs, and demonstrate their promise on navigational sign understanding. Code and dataset are available on Github.

machine learning, natural language, navigational sign, (20 more...)

arXiv.org Artificial Intelligence

2506.02556

Genre: Research Report (0.40)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

PatrolVision: Automated License Plate Recognition in the wild

Singhal, Anmol Singhal Navya

arXiv.org Artificial IntelligenceApr-16-2025

Adoption of AI driven techniques in public services remains low due to challenges related to accuracy and speed of information at population scale. Computer vision techniques for traffic monitoring have not gained much popularity despite their relative strength in areas such as autonomous driving. Despite large number of academic methods for Automatic License Plate Recognition (ALPR) systems, very few provide an end to end solution for patrolling in the city. This paper presents a novel prototype for a low power GPU based patrolling system to be deployed in an urban environment on surveillance vehicles for automated vehicle detection, recognition and tracking. In this work, we propose a complete ALPR system for Singapore license plates having both single and double line creating our own YOLO based network. We focus on unconstrained capture scenarios as would be the case in real world application, where the license plate (LP) might be considerably distorted due to oblique views. In this work, we first detect the license plate from the full image using RFB-Net and rectify multiple distorted license plates in a single image. After that, the detected license plate image is fed to our network for character recognition. We evaluate the performance of our proposed system on a newly built dataset covering more than 16,000 images. The system was able to correctly detect license plates with 86\% precision and recognize characters of a license plate in 67\% of the test set, and 89\% accuracy with one incorrect character (partial match). We also test latency of our system and achieve 64FPS on Tesla P4 GPU

artificial intelligence, machine learning, optical character recognition, (15 more...)

arXiv.org Artificial Intelligence

2504.1081

Country:

South America > Brazil (0.28)
Asia > Singapore (0.25)

Genre: Research Report (0.64)

Industry:

Information Technology (0.35)
Transportation (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.71)

Add feedback

DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Xie, Yu, Qiao, Qian, Gao, Jun, Wu, Tianxiang, Huang, Shaoyao, Fan, Jiaqing, Cao, Ziqiang, Wang, Zili, Zhang, Yue, Zhang, Jielei, Sun, Huyang

arXiv.org Artificial IntelligenceAug-1-2024

More and more end-to-end text spotting methods based on Transformer architecture have demonstrated superior performance. These methods utilize a bipartite graph matching algorithm to perform one-to-one optimal matching between predicted objects and actual objects. However, the instability of bipartite graph matching can lead to inconsistent optimization targets, thereby affecting the training performance of the model. Existing literature applies denoising training to solve the problem of bipartite graph matching instability in object detection tasks. Unfortunately, this denoising training method cannot be directly applied to text spotting tasks, as these tasks need to perform irregular shape detection tasks and more complex text recognition tasks than classification. To address this issue, we propose a novel denoising training method (DNTextSpotter) for arbitrary-shaped text spotting. Specifically, we decompose the queries of the denoising part into noised positional queries and noised content queries. We use the four Bezier control points of the Bezier center curve to generate the noised positional queries. For the noised content queries, considering that the output of the text in a fixed positional order is not conducive to aligning position with content, we employ a masked character sliding method to initialize noised content queries, thereby assisting in the alignment of text content and position. To improve the model's perception of the background, we further utilize an additional loss function for background characters classification in the denoising training part.Although DNTextSpotter is conceptually simple, it outperforms the state-of-the-art methods on four benchmarks (Total-Text, SCUT-CTW1500, ICDAR15, and Inverse-Text), especially yielding an improvement of 11.3% against the best approach in Inverse-Text dataset.

dntextspotter, query, recognition, (14 more...)

arXiv.org Artificial Intelligence

2408.00355

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Enhanced Review Detection and Recognition: A Platform-Agnostic Approach with Application to Online Commerce

Karmakar, Priyabrata, Hawkins, John

arXiv.org Artificial IntelligenceMay-8-2024

Online commerce relies heavily on user generated reviews to provide unbiased information about products that they have not physically seen. The importance of reviews has attracted multiple exploitative online behaviours and requires methods for monitoring and detecting reviews. We present a machine learning methodology for review detection and extraction, and demonstrate that it generalises for use across websites that were not contained in the training data. This method promises to drive applications for automatic detection and evaluation of reviews, regardless of their source. Furthermore, we showcase the versatility of our method by implementing and discussing three key applications for analysing reviews: Sentiment Inconsistency Analysis, which detects and filters out unreliable reviews based on inconsistencies between ratings and comments; Multi-language support, enabling the extraction and translation of reviews from various languages without relying on HTML scraping; and Fake review detection, achieved by integrating a trained NLP model to identify and distinguish between genuine and fake reviews.

detection, online review, platform, (14 more...)

arXiv.org Artificial Intelligence

2405.06704

Country:

Oceania > Australia (0.04)
South America (0.04)
North America > Central America (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Information Technology > Services (0.48)
Retail (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)

Add feedback

Intelligent Robotic Control System Based on Computer Vision Technology

Che, Chang, Zheng, Haotian, Huang, Zengyi, Jiang, Wei, Liu, Bo

arXiv.org Artificial IntelligenceApr-1-2024

Computer vision is a kind of simulation of biological vision using computers and related equipment. It is an important part of the field of artificial intelligence. Its research goal is to make computers have the ability to recognize three-dimensional environmental information through two-dimensional images. Computer vision is based on image processing technology, signal processing technology, probability statistical analysis, computational geometry, neural network, machine learning theory and computer information processing technology, through computer analysis and processing of visual information.The article explores the intersection of computer vision technology and robotic control, highlighting its importance in various fields such as industrial automation, healthcare, and environmental protection. Computer vision technology, which simulates human visual observation, plays a crucial role in enabling robots to perceive and understand their surroundings, leading to advancements in tasks like autonomous navigation, object recognition, and waste management. By integrating computer vision with robot control, robots gain the ability to interact intelligently with their environment, improving efficiency, quality, and environmental sustainability.

garbage, robot, vision technology, (15 more...)

arXiv.org Artificial Intelligence

2404.01116

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Water & Waste Management > Solid Waste Management (1.00)
Information Technology (1.00)
Health & Medicine (1.00)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > Polymers & Plastics (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR

Turnbull, Robert, Mannix, Evelyn

arXiv.org Artificial IntelligenceJan-23-2024

Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR Robert Turnbull, Evelyn Mannix First place in character recognition challenge Second place in character detection challenge Best recall and precision results for detection and recognition results for IoU 0.5 Releasing prediction results in multiple formats for 4500+ Oxyrhynchus Papyri images Abstract The capacity to isolate and recognize individual characters from facsimile images of papyrus manuscripts yields rich opportunities for digital analysis. For this reason the'ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri' was held as part of the 17 We used an ensemble of YOLOv8 models to detect and classify individual characters and employed two different approaches for refining the character predictions, including a transformer based DeiT approach and a ResNet-50 model trained on a large corpus of unlabelled data using SimCLR, a self-supervised learning method. Our submission won the recognition challenge with a mAP of 42.2%, and was runner-up in the detection challenge with a mean average precision (mAP) of 51.4%. At the more relaxed intersection over union threshold of 0.5, we achieved the highest mean average precision and mean average recall results for both detection and classification. We ran our prediction pipeline on more than 4,500 images from the Oxyrhynchus Papyri to illustrate the utility of our approach, and we release the results publicly in multiple formats.

oxyrhynchus papyri, papyri, submission, (13 more...)

arXiv.org Artificial Intelligence

2401.12513

Country:

Oceania > Australia (0.14)
North America > Canada > Ontario > Toronto (0.14)
Africa > Middle East > Egypt (0.05)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Horizontal Federated Computer Vision

Mandal, Paul K., Leo, Cole, Hurley, Connor

arXiv.org Artificial IntelligenceDec-30-2023

In the modern world, the amount of visual data recorded has been rapidly increasing. In many cases, data is stored in geographically distinct locations and thus requires a large amount of time and space to consolidate. Sometimes, there are also regulations for privacy protection which prevent data consolidation. In this work, we present federated implementations for object detection and recognition using a federated Faster R-CNN (FRCNN) and image segmentation using a federated Fully Convolutional Network (FCN). Our FRCNN was trained on 5000 examples of the COCO2017 dataset while our FCN was trained on the entire train set of the CamVid dataset. The proposed federated models address the challenges posed by the increasing volume and decentralized nature of visual data, offering efficient solutions in compliance with privacy regulations.

federate, federated learning, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2401.0039

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Enhancing Vehicle Entrance and Parking Management: Deep Learning Solutions for Efficiency and Security

Ramzan, Muhammad Umer, Ali, Usman, Naqvi, Syed Haider Abbas, Aslam, Zeeshan, Tehseen, null, Ali, Husnain, Faheem, Muhammad

arXiv.org Artificial IntelligenceDec-5-2023

The auto-management of vehicle entrance and parking in any organization is a complex challenge encompassing record-keeping, efficiency, and security concerns. Manual methods for tracking vehicles and finding parking spaces are slow and a waste of time. To solve the problem of auto management of vehicle entrance and parking, we have utilized state-of-the-art deep learning models and automated the process of vehicle entrance and parking into any organization. To ensure security, our system integrated vehicle detection, license number plate verification, and face detection and recognition models to ensure that the person and vehicle are registered with the organization. We have trained multiple deep-learning models for vehicle detection, license number plate detection, face detection, and recognition, however, the YOLOv8n model outperformed all the other models. Furthermore, License plate recognition is facilitated by Google's Tesseract-OCR Engine. By integrating these technologies, the system offers efficient vehicle detection, precise identification, streamlined record keeping, and optimized parking slot allocation in buildings, thereby enhancing convenience, accuracy, and security. Future research opportunities lie in fine-tuning system performance for a wide range of real-world applications.

detection, recognition, vehicle, (15 more...)

arXiv.org Artificial Intelligence

2312.02699

Country: Asia > Pakistan (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.88)
Transportation > Ground > Road (0.88)
Transportation > Infrastructure & Services (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

Feng, Hao, Wang, Zijian, Tang, Jingqun, Lu, Jinghui, Zhou, Wengang, Li, Houqiang, Huang, Can

arXiv.org Artificial IntelligenceSep-2-2023

In the era of Large Language Models (LLMs), tremendous strides have been made in the field of multimodal understanding. However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored. In this work, we introduce UniDoc, a novel multimodal model equipped with text detection and recognition capabilities, which are deficient in existing approaches. Moreover, UniDoc capitalizes on the beneficial interactions among tasks to enhance the performance of each individual task. To implement UniDoc, we perform unified multimodal instruct tuning on the contributed large-scale instruction following datasets. Quantitative and qualitative experimental results show that UniDoc sets state-of-the-art scores across multiple challenging benchmarks. To the best of our knowledge, this is the first large multimodal model capable of simultaneous text detection, recognition, spotting, and understanding.

instruction, recognition, unidoc, (13 more...)

arXiv.org Artificial Intelligence

2308.11592

Country:

Asia > China (0.04)
Asia > Vietnam (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback